Skip to content

BUG: HDFStore failures on timezone-aware data #20595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

adshieh
Copy link

@adshieh adshieh commented Apr 3, 2018

@codecov
Copy link

codecov bot commented Apr 3, 2018

Codecov Report

Merging #20595 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20595      +/-   ##
==========================================
+ Coverage   91.82%   91.82%   +<.01%     
==========================================
  Files         153      153              
  Lines       49256    49259       +3     
==========================================
+ Hits        45229    45232       +3     
  Misses       4027     4027
Flag Coverage Δ
#multiple 90.21% <82.35%> (-0.01%) ⬇️
#single 41.93% <100%> (+0.02%) ⬆️
Impacted Files Coverage Δ
pandas/io/pytables.py 92.46% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73cb32e...b0c8f82. Read the comment docs.

@@ -2793,10 +2793,20 @@ def test_empty_series_frame(self):
self._check_roundtrip(df2, tm.assert_frame_equal)

def test_empty_series(self):
for dtype in [np.int64, np.float64, np.object, 'm8[ns]', 'M8[ns]']:
for dtype in [np.int64, np.float64, np.object, 'm8[ns]', 'M8[ns]',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use @pytest.mark.parametrize to parametrize this test over dtype instead of using a for loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -1099,6 +1099,7 @@ I/O
- Bug in :meth:`pandas.io.json.json_normalize` where subrecords are not properly normalized if any subrecords values are NoneType (:issue:`20030`)
- Bug in ``usecols`` parameter in :func:`pandas.io.read_csv` and :func:`pandas.io.read_table` where error is not raised correctly when passing a string. (:issue:`20529`)
- Bug in :func:`HDFStore.keys` when reading a file with a softlink causes exception (:issue:`20523`)
- Bug in :class:`HDFStore` for Series and empty DataFrames with timezone-aware data (:issue:`20594`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not very clear, is this for Series generally and empty DataFrames? this is only for a fixed store.

@@ -2771,7 +2765,11 @@ def read(self, **kwargs):
def write(self, obj, **kwargs):
super(SeriesFixed, self).write(obj, **kwargs)
self.write_index('index', obj.index)
self.write_array('values', obj.values)
if is_datetime64tz_dtype(obj.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is reaching into the internal implementation way too much.

you can use obj.get_values() here

Copy link
Author

@adshieh adshieh Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with obj.get_values() is that it returns an array with dtype datetime64[ns], but we need a DatetimeIndex to preserve the timezone information. How about obj.dt._get_values()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so write_array already handles the objects corrctly, try just passing obj into that (you may need to add a case in write_array, but see where this gets you

self._check_roundtrip(s, tm.assert_series_equal)
@pytest.mark.parametrize('dtype', [
np.int64, np.float64, np.object, 'm8[ns]', 'M8[ns]',
'datetime64[ns, UTC]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add US/Eastern as well

s = Series(dtype=dtype)
self._check_roundtrip(s, tm.assert_series_equal)

def test_series_timezone(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize these both with US/Eastern as well

@jreback jreback added Bug IO HDF5 read_hdf, HDFStore labels Apr 5, 2018
@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

can you update to last comments

@jreback
Copy link
Contributor

jreback commented Sep 25, 2018

can you rebase and respond to comments

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

closing as stale. would accept if rebased and comments addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: HDFStore failures on timezone-aware data
3 participants